Code
knitr::include_graphics("data/eedi/images/945.jpg")EAAI Submission
Data comes from the Eedi dataset, which contains 1.5 million interactions between students and tutors. The data is available here
Here’s a quick look at the data.
We 948 graphical questions. See an example below
knitr::include_graphics("data/eedi/images/945.jpg")Here’s what the training data looks like.
library(gt)
read_csv("data/eedi/train_data/train_task_1_2.csv") %>% head %>% gt() %>% tab_header(title = "Training data: Task 1 and 2")| Training data: Task 1 and 2 | |||||
| QuestionId | UserId | AnswerId | IsCorrect | CorrectAnswer | AnswerValue |
|---|---|---|---|---|---|
| 16997 | 65967 | 12453206 | 0 | 4 | 2 |
| 16531 | 62121 | 15686710 | 1 | 1 | 1 |
| 15911 | 50013 | 13598796 | 0 | 3 | 1 |
| 1701 | 104909 | 10511925 | 0 | 4 | 3 |
| 22896 | 21748 | 941747 | 0 | 1 | 4 |
| 20968 | 67570 | 626158 | 1 | 4 | 4 |
train_data = read_csv("data/eedi/train_data/train_task_3_4.csv")
train_data %>% head %>% gt() %>% tab_header(title = "Training data: Task 3 and 4")| Training data: Task 3 and 4 | |||||
| QuestionId | UserId | AnswerId | IsCorrect | CorrectAnswer | AnswerValue |
|---|---|---|---|---|---|
| 898 | 2111 | 280203 | 1 | 2 | 2 |
| 767 | 3062 | 55638 | 1 | 3 | 3 |
| 165 | 1156 | 386475 | 1 | 2 | 2 |
| 490 | 1653 | 997498 | 1 | 4 | 4 |
| 298 | 3912 | 578636 | 1 | 3 | 3 |
| 791 | 5040 | 1335238 | 1 | 2 | 2 |
For each of these, we can know what the question is about:
question_metadata = read_csv("data/eedi/metadata/question_metadata_task_1_2.csv")
question_metadata %>%
head() %>%
gt() %>%
tab_header(title = "Question Metadata")| Question Metadata | |
| QuestionId | SubjectId |
|---|---|
| 13090 | [3, 32, 71, 77, 141, 185, 186, 214] |
| 1855 | [3, 71, 75, 86, 178] |
| 10423 | [3, 32, 38, 239] |
| 2290 | [3, 32, 33, 144] |
| 12785 | [3, 32, 33, 144] |
| 25704 | [3, 32, 33, 144] |
And we can map these codes back to subjects:
subject_metadata = read_csv("data/eedi/metadata/subject_metadata.csv")
subject_metadata %>%
head() %>%
gt() %>%
tab_header(title = "Subject Metadata")| Subject Metadata | |||
| SubjectId | Name | ParentId | Level |
|---|---|---|---|
| 3 | Maths | NULL | 0 |
| 32 | Number | 3 | 1 |
| 33 | BIDMAS | 144 | 3 |
| 34 | Upper and Lower Bounds | 141 | 3 |
| 35 | Calculator Use | 32 | 2 |
| 36 | Decimals | 32 | 2 |
We have metadata for the students:
student_metadata = read_csv("data/eedi/metadata/student_metadata_task_3_4.csv")
student_metadata %>%
head() %>%
gt() %>%
tab_header(title = "Student Metadata")| Student Metadata | |||
| UserId | Gender | DateOfBirth | PremiumPupil |
|---|---|---|---|
| 3837 | 2 | 2007-10-01 | 0 |
| 207 | 0 | NA | NA |
| 3581 | 2 | 2008-03-01 | 0 |
| 6028 | 2 | 2008-01-01 | 0 |
| 1451 | 0 | NA | NA |
| 6062 | 2 | 2008-01-01 | NA |
… and the answer, which includes a confidence score.
answer_metadata = read_csv("data/eedi/metadata/answer_metadata_task_3_4.csv")
answer_metadata %>%
head() %>%
gt() %>%
tab_header(title = "Answer Metadata")| Answer Metadata | |||||
| AnswerId | DateAnswered | Confidence | GroupId | QuizId | SchemeOfWorkId |
|---|---|---|---|---|---|
| 1451945 | 2019-10-30 14:34:00 | NA | 4 | 32 | 52562 |
| 45325 | 2020-01-06 18:53:00 | 75 | 185 | 66 | 52562 |
| 687013 | 2020-01-18 10:52:00 | NA | 235 | 64 | 52562 |
| 91254 | 2020-02-29 17:25:00 | NA | 194 | 97 | 52562 |
| 1225855 | 2020-03-06 15:07:00 | NA | 95 | 115 | 52562 |
| 1485952 | 2019-10-08 11:37:00 | NA | 374 | 68 | 52562 |
answer_metadata %>% count(is.na(Confidence)) %>% mutate(prop = n/sum(n))# A tibble: 2 × 3
`is.na(Confidence)` n prop
<lgl> <int> <dbl>
1 FALSE 383096 0.254
2 TRUE 1125821 0.746
It seems to me that to generate conversations, we can use the following procedure:
Let’s work through an example, say the 1000th row in the data.
train_data %>% slice(1000) %>% gt()| QuestionId | UserId | AnswerId | IsCorrect | CorrectAnswer | AnswerValue |
|---|---|---|---|---|---|
| 533 | 2918 | 723667 | 1 | 3 | 3 |
This person saw the question below, and answered C (3), the correct response.
knitr::include_graphics("data/eedi/images/533.jpg")Let’s try to get the text out of the question see ocr.py for the code.
What belongs in the spaces? ? 3)
Here’s what we know about that student: No age info, and gender 0
student_metadata %>% filter(UserId == 2918) %>% gt()| UserId | Gender | DateOfBirth | PremiumPupil |
|---|---|---|---|
| 2918 | 0 | NA | NA |
Here’s what we know about the answer and its confidence: Not much!
answer_metadata %>% filter(AnswerId == 723667) %>% gt()| AnswerId | DateAnswered | Confidence | GroupId | QuizId | SchemeOfWorkId |
|---|---|---|---|---|---|
| 723667 | 2019-11-16 13:48:00 | NA | 283 | 37 | 52562 |
What was the question about?
question_metadata %>% filter(QuestionId == 533) %>%
pull(SubjectId) %>%
# get numbers
str_extract_all("[0-9]+", simplify = TRUE) %>%
as.numeric() %>%
enframe() %>%
left_join(subject_metadata, by = c("value" = "SubjectId")) %>%
gt()| name | value | Name | ParentId | Level |
|---|---|---|---|---|
| 1 | 3 | Maths | NULL | 0 |
| 2 | 71 | Geometry and Measure | 3 | 1 |
| 3 | 88 | Properties of Quadrilaterals | 174 | 3 |
| 4 | 174 | 2D Names and Properties of Shapes | 71 | 2 |
And here is what we know from that student’s sequence: That student, in that session got the first question right, the second one wrong, and then tackled the question we are talking about now.
train_data %>% filter(UserId == 2918) %>%
left_join(question_metadata, by = c("QuestionId" = "QuestionId")) %>%
left_join(answer_metadata, by = c("AnswerId" = "AnswerId")) %>%
arrange(DateAnswered) %>%
filter(QuizId == 37) %>%
gt() %>%
tab_style(
style = list(
cell_fill(color = "lightblue")
),
locations = cells_body(
rows = c(QuestionId == 533)
)
) %>%
tab_header(title = "Sequence of questions")| Sequence of questions | |||||||||||
| QuestionId | UserId | AnswerId | IsCorrect | CorrectAnswer | AnswerValue | SubjectId | DateAnswered | Confidence | GroupId | QuizId | SchemeOfWorkId |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 836 | 2918 | 641714 | 1 | 4 | 4 | [3, 49, 54, 260] | 2019-11-16 13:46:00 | NA | 283 | 37 | 52562 |
| 856 | 2918 | 1492796 | 0 | 3 | 2 | [3, 71, 98, 209] | 2019-11-16 13:47:00 | NA | 283 | 37 | 52562 |
| 533 | 2918 | 723667 | 1 | 3 | 3 | [3, 71, 88, 174] | 2019-11-16 13:48:00 | NA | 283 | 37 | 52562 |
| 502 | 2918 | 102841 | 1 | 1 | 1 | [3, 32, 39, 225] | 2019-11-16 13:49:00 | NA | 283 | 37 | 52562 |
| 409 | 2918 | 1383474 | 1 | 4 | 4 | [3, 32, 46, 240] | 2019-11-16 13:51:00 | NA | 283 | 37 | 52562 |
| 255 | 2918 | 225378 | 1 | 2 | 2 | [3, 32, 39, 225, 226] | 2019-11-16 13:51:00 | NA | 283 | 37 | 52562 |
| 642 | 2918 | 935137 | 0 | 2 | 1 | [3, 71, 95, 278] | 2019-11-16 13:52:00 | NA | 283 | 37 | 52562 |
| 91 | 2918 | 18414 | 0 | 4 | 3 | [3, 49, 54, 259] | 2019-11-16 13:52:00 | NA | 283 | 37 | 52562 |
| 319 | 2918 | 743896 | 0 | 2 | 1 | [3, 49, 53, 153] | 2019-11-16 13:53:00 | NA | 283 | 37 | 52562 |
| 551 | 2918 | 1267043 | 1 | 1 | 1 | [3, 49, 56, 68, 158, 163] | 2019-11-16 13:54:00 | NA | 283 | 37 | 52562 |
| 370 | 2918 | 1044951 | 1 | 2 | 2 | [3, 32, 39, 228] | 2019-11-16 13:55:00 | NA | 283 | 37 | 52562 |
| 670 | 2918 | 667543 | 0 | 3 | 1 | [3, 32, 144, 203] | 2019-11-16 13:56:00 | NA | 283 | 37 | 52562 |
| 862 | 2918 | 43104 | 0 | 2 | 3 | [3, 71, 75, 178] | 2019-11-16 13:57:00 | NA | 283 | 37 | 52562 |
| 844 | 2918 | 1465203 | 0 | 2 | 4 | [3, 49, 52, 153] | 2019-11-16 13:58:00 | NA | 283 | 37 | 52562 |
| 421 | 2918 | 1059241 | 1 | 4 | 4 | [3, 32, 141, 215, 216] | 2019-11-16 13:58:00 | NA | 283 | 37 | 52562 |
| 30 | 2918 | 150731 | 0 | 1 | 4 | [3, 49, 54, 158, 159, 264] | 2019-11-16 13:58:00 | NA | 283 | 37 | 52562 |
I started generating some conversations using chatGPT using the following prompt:
To get more diversity, I am pulling other psychological characteristics from hypothetical distributions.
You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help.
Question text: ‘What belongs in the spaces? ? 3)’ Did the student get the question right? Yes What was the student’s confidence? Unknown What was the sequence of student work: correct, incorrect, {now} Student age: Unknown - (we can pull from the distribution if Unknown) Student gender: Unknown Student IQ: .75 Student Extraversion: .23 Student Agreeableness: -.53 Student Conscientiousness: .63 Student Neuroticism: 1.2 Student Openness: .03
This produced the following interaction:
Student (S): I’m working on this question: “What belongs in the spaces? ? 3)”. I initially thought I had the right answer, but I made a mistake in my second attempt. Can you help me figure it out?
Intelligent Tutor (IT): Of course! I see you’ve had a go at it, which is excellent. Can you share your thought process on how you approached it the first time and what changed in your second attempt?
S: In my first approach, I tried breaking down the equation, but in my second attempt, I second-guessed myself and tried another method which didn’t work out.
IT: It’s completely normal to second-guess oneself, especially when faced with challenging problems. For this equation, I’d suggest we go step-by-step. Can you try to articulate the problem, just like you did in your first attempt?
S: Okay, it seems like the equation is something like “t plus 2 times something minus 3 equals 4.” So, I’m trying to find that unknown value.
IT: That’s right! Let’s use that framework to solve the problem. Remember, sometimes the simplest approach can be the correct one.
S: I think I get it now, but I still feel a bit unsure about my answers sometimes. I wish I was more confident in my abilities.
IT: Many students feel that way. It’s essential to remember that everyone learns at their own pace, and making mistakes is a part of the learning process. It’s more important to understand the concept than to always get the right answer the first time. If you ever feel uncertain about an answer, it’s always a good idea to revisit the fundamentals of the topic. Would you like some additional resources or practice problems on writing and simplifying expressions?
S: Yes, that would be helpful. I think more practice might make me feel more confident.
IT: Absolutely! Practice can enhance your understanding and boost your confidence. Remember, every challenge you face in learning is an opportunity for growth. And I’m here to help guide you through any difficulties you encounter. Let’s keep going!
We can deploy this at scale using the API. Here’s a script that generates conversations.
Code to be written.From running this, we create a dataset, which we can use to identify motivational states. See data/conversations.jsonl for the resulting dataset.
We can use the dataset to identify motivational states. We can use the following code to identify the most common motivational states.
There are a few ways in which we can go about this. Perhaps the simplest but most expensive one is to have chatGPT read the conversations and score them.
Take the following prompt:
You are a classroom observer. You will read a conversation between a student and an intelligent tutoring system. Please score the student on the following motivational states on a scale from 0 - 100:
Confidence: How confident is the student in their ability to solve the problem? Frustration: How frustrated is the student with their learning experience? Boredom: How bored is the student with their learning experience? Curiosity/Interest: How interested/curious is the student about the topic? Engagement: How engaged is the student with the learning experience?
Conversation: {conversation}
Produces the following output
Based on the conversation, here’s how I would score the student on the given motivational states (on a scale from 0 - 100):
Confidence: 60 - The student does show some understanding of the problem and articulates their thought process, but they also express a wish for more confidence in their abilities. The score isn’t extremely low because they did have an initial correct attempt and were willing to articulate their approach.
Frustration: 40 - The student isn’t overtly frustrated but does exhibit signs of being somewhat frustrated as they mention second-guessing themselves and feeling unsure about their answers.
Boredom: 10 - The student doesn’t seem to display signs of boredom. They are actively participating in the conversation and seeking help, which suggests they are somewhat invested in understanding the material.
Curiosity/Interest: 70 - The student is actively engaging in the conversation, asking for help, and expressing a desire for more practice and resources, indicating a decent level of curiosity and interest in the topic.
Engagement: 85 - The student is very engaged in the learning experience. They discuss their thought process, attempt to solve the problem with guidance, and express a desire for further resources and practice.
These scores are based on the specific conversation provided and are subjective to interpretation. Different observers might score the conversation slightly differently based on their own perceptions and experiences.
again, to be written, but we can easily use API to scale this up.Another option is using a classifier to identify the motivational states. We can use the following code to train a classifier.
I have a code base to go from a dataset of annotated conversations to a BERT model to classify motivational states.A final option is to use semantic similarity to find motivational states.
import sentence_transformers
from sentence_transformers import SentenceTransformer, util
import numpy as np
import pandas as pd
# Load the model
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
# Read the data
conversation = open("data/conversation.txt", "r").read()
# Split the conversation into utterances
utterances = conversation.split("\n")
student_utterances = utterances[1::2]
student_utterances = [s.replace("S: ", "").replace("Student (S): ", "") for s in student_utterances]
# Compute embeddings
model.encode(student_utterances)
# target embedding
confidence = "I'm sure I can solve this problem on my own."
frustration = "I've tried solving this problem so many times, but I just can't get the right answer. I don't know what I'm doing wrong!"
boredom = "This topic doesn't interest me at all. I've seen it so many times before, and it just feels repetitive."
curiosity = "I'm intrigued by this concept. Can you recommend any additional resources or examples to explore it further?"
engagement = "I'd love to dive deeper into this topic and understand the underlying principles. Can we discuss it in more detail?"
# embed them
confidence_embedding = model.encode(confidence)
frustration_embedding = model.encode(frustration)
boredom_embedding = model.encode(boredom)
curiosity_embedding = model.encode(curiosity)
engagement_embedding = model.encode(engagement)
# compute similarity between student utterances and target embeddings
confidence_similarity = util.pytorch_cos_sim(model.encode(student_utterances), confidence_embedding)
frustration_similarity = util.pytorch_cos_sim(model.encode(student_utterances), frustration_embedding)
boredom_similarity = util.pytorch_cos_sim(model.encode(student_utterances), boredom_embedding)
curiosity_similarity = util.pytorch_cos_sim(model.encode(student_utterances), curiosity_embedding)
engagement_similarity = util.pytorch_cos_sim(model.encode(student_utterances), engagement_embedding)We can track the similarity of student language to our target embeddings for each utterance. We can average them over time, or weigh the more recent ones more heavily. See graph below for the resulting similarity over time, for each motivational state. We can refine the target embeddings, and compute over a sample of them, rather than a single one for each state.
knitr::include_graphics("output/pilot/embedding similarity.png")We can simulate learning outcomes by using a reinforcement learning framework with some assumptions, and then simulate a range of learning outcomes.
We can use the following system to simulate how a motivationally aware tutor guides student’s learning, compared to a baseline.
When a student encounters an activity, they have a probability of knowing the answer if they try. This depends on their skill level and the difficulty of the activity.
\(P(K) = f(S,D)\)
When a student faces an activity they can try, guess, or quit. This depends on their skill level, the difficulty of the task, and their motivational state.
\(P(E) = f(S,D,M)\)
If a student tries, theyr probability of getting the question correct depneds on their skill level, the difficulty of the task, and their motivation.
\(P(Correct|Try) = f(S,D,M)\)
If a student gets the question correct, they can learn from it. This depends on their skill level, the difficulty of the task, and their motivation.
\(P(Learn|Correct) = f(S,D,M)\)
If a student guesses, or gets the answer wrong, they cannot learn:
\(P(Learn|Incorrect) = 0\) \(P(Learn|Guess) = 0\)
Initialize student skill S as random or if available.
Choose D based on S.
While quit is false:
present activity A with difficulty D to student.
observe the correctness of the student's response
observe language of the student's response
update M from language of the student's response
update S from correctness of the student's response
pick a difficulty D for the next problem and serve it to the student.Does the same, but does not use M to choose D.
Still needs work… Mostly, we need a model of how M changes over time. I’d say M goes down over time, but is also dependent on probability_student_knows if you get the question right, but you were supposed to get it right, it might not be very motivating, on the other hand, if you try, and get a hard question right (e.g., \(p\) was low), that might be motivating. Choosing to guess also lowers motivation.
We are also not modeling changes in motivation due to ITS actions. We are assuming that the ITS can influence the motivation, so the M-ITS could observe that when M gets too low, it shows not a math question, but a motivation prompt that produces no learning outcome, but increases M.
set.seed(34)
# Setting parameters
initial_skill <- 120
skill_variance <- 10
initial_motivation <- 100
motivation_variance <- 10
motivation_boost <- 0.01
initial_difficulty <- rnorm(1, initial_skill, skill_variance)
measurement_error <- rnorm(1, 0, 5) # measurement error for motivation
# Initialize
S <- initial_skill
M <- rnorm(1, initial_motivation, motivation_variance)
D <- initial_difficulty
engagement_outcome <- "engage"
iteration <- 1 # iteration counter
# ITS recommender simulation
while (engagement_outcome != "quit") {
print(paste("Iteration:", iteration))
print(paste("Skill (S):", S))
print(paste("Motivation (M):", M))
print(paste("Difficulty (D):", D))
# Calculate probability student knows answer
skill_minus_difficulty <- S - D
probability_student_knows <- 1 / (1 + exp(-skill_minus_difficulty))
# Determine engagement probabilities
P_engage <- 1 / (1 + exp(-0.05 * (S + M - D)))
P_engage <- min(max(P_engage, 0.05), 0.95) # Clipping P_engage to lie within [0.05, 0.95]
P_guess <- (1 - P_engage) / 2
P_quit <- 1 - P_engage - P_guess
# Determine engagement outcome
engagement_outcome <- sample(c("try", "guess", "quit"), 1, prob = c(P_engage, P_guess, P_quit))
print(paste("Engagement outcome:", engagement_outcome))
# Update M and S based on outcome
if (engagement_outcome == "try") {
probability_correct_try <- 1 / (1 + exp(-(probability_student_knows + motivation_boost * M)))
correct <- sample(c(TRUE, FALSE), 1, prob = c(probability_correct_try, 1 - probability_correct_try))
print(paste("Tried and got the answer", ifelse(correct, "correct", "incorrect")))
if (correct) {
S <- S + rnorm(1, 0.5, 0.1) # Assumption: increase skill slightly if correct
M <- M + rnorm(1, 0, measurement_error) # Assumption: Motivation update with measurement error
} else {
M <- M - rnorm(1, 0, measurement_error) # Assumption: decrease motivation with error if wrong
}
} else if (engagement_outcome == "guess") {
correct <- sample(c(TRUE, FALSE), 1, prob = c(0.5, 0.5)) # 50% chance to get it right by guessing
print(paste("Guessed and got the answer", ifelse(correct, "correct", "incorrect")))
if (correct) {
M <- M - rnorm(1, 0, measurement_error) # Assumption: decrease motivation with error if correct guess
} else {
M <- M - rnorm(1, 0, measurement_error) # Assumption: decrease motivation with error if wrong guess
}
}
# Update D based on new estimates of S and M
D <- rnorm(1, S, skill_variance)
iteration <- iteration + 1
print("") # Blank line for better readability
}[1] "Iteration: 1"
[1] "Skill (S): 120"
[1] "Motivation (M): 92.522775981045"
[1] "Difficulty (D): 118.611100288291"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 2"
[1] "Skill (S): 120.473641848657"
[1] "Motivation (M): 89.7902492052396"
[1] "Difficulty (D): 127.17984229338"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 3"
[1] "Skill (S): 120.473641848657"
[1] "Motivation (M): 83.3904201962166"
[1] "Difficulty (D): 120.399036513156"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 4"
[1] "Skill (S): 120.473641848657"
[1] "Motivation (M): 79.0764452981371"
[1] "Difficulty (D): 118.673055306185"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 5"
[1] "Skill (S): 121.013767341498"
[1] "Motivation (M): 87.2135166356191"
[1] "Difficulty (D): 121.206033736084"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 6"
[1] "Skill (S): 121.329501152056"
[1] "Motivation (M): 85.5353327080801"
[1] "Difficulty (D): 106.021805123206"
[1] "Engagement outcome: guess"
[1] "Guessed and got the answer correct"
[1] ""
[1] "Iteration: 7"
[1] "Skill (S): 121.329501152056"
[1] "Motivation (M): 92.0259051081105"
[1] "Difficulty (D): 107.08352309359"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 8"
[1] "Skill (S): 121.329501152056"
[1] "Motivation (M): 87.3384275016425"
[1] "Difficulty (D): 112.335023400845"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 9"
[1] "Skill (S): 121.805563758213"
[1] "Motivation (M): 94.3453344406226"
[1] "Difficulty (D): 113.824177260716"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 10"
[1] "Skill (S): 121.805563758213"
[1] "Motivation (M): 101.67292882441"
[1] "Difficulty (D): 121.771352896388"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 11"
[1] "Skill (S): 122.255430357959"
[1] "Motivation (M): 102.749385349673"
[1] "Difficulty (D): 122.857090874774"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 12"
[1] "Skill (S): 122.753045239583"
[1] "Motivation (M): 98.6081727441642"
[1] "Difficulty (D): 116.807890393747"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 13"
[1] "Skill (S): 123.073628971494"
[1] "Motivation (M): 97.8676679361294"
[1] "Difficulty (D): 131.830547251739"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 14"
[1] "Skill (S): 123.762018437271"
[1] "Motivation (M): 93.1096585726148"
[1] "Difficulty (D): 131.365194752917"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 15"
[1] "Skill (S): 124.151099270629"
[1] "Motivation (M): 88.574105822155"
[1] "Difficulty (D): 123.309856983051"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 16"
[1] "Skill (S): 124.74736782285"
[1] "Motivation (M): 82.4205961360577"
[1] "Difficulty (D): 137.391804233923"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 17"
[1] "Skill (S): 125.370779296536"
[1] "Motivation (M): 76.2580509868518"
[1] "Difficulty (D): 139.401232895492"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 18"
[1] "Skill (S): 125.691899398066"
[1] "Motivation (M): 71.9119908969106"
[1] "Difficulty (D): 128.74026566609"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 19"
[1] "Skill (S): 125.691899398066"
[1] "Motivation (M): 67.827270303239"
[1] "Difficulty (D): 132.396577749027"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 20"
[1] "Skill (S): 126.241986619798"
[1] "Motivation (M): 63.0271730982994"
[1] "Difficulty (D): 120.969611045667"
[1] "Engagement outcome: guess"
[1] "Guessed and got the answer correct"
[1] ""
[1] "Iteration: 21"
[1] "Skill (S): 126.241986619798"
[1] "Motivation (M): 65.5665703673201"
[1] "Difficulty (D): 115.296023486256"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 22"
[1] "Skill (S): 126.241986619798"
[1] "Motivation (M): 74.7319971940933"
[1] "Difficulty (D): 112.090065904627"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 23"
[1] "Skill (S): 126.780347365827"
[1] "Motivation (M): 82.3964585619134"
[1] "Difficulty (D): 143.423818715256"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 24"
[1] "Skill (S): 127.315239337345"
[1] "Motivation (M): 88.1715965466303"
[1] "Difficulty (D): 122.754468834846"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 25"
[1] "Skill (S): 127.769342855569"
[1] "Motivation (M): 90.9454164171349"
[1] "Difficulty (D): 128.526428967909"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 26"
[1] "Skill (S): 128.333943366362"
[1] "Motivation (M): 101.142359297115"
[1] "Difficulty (D): 108.057870854171"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 27"
[1] "Skill (S): 128.906716759848"
[1] "Motivation (M): 100.198319024691"
[1] "Difficulty (D): 129.574799494451"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 28"
[1] "Skill (S): 128.906716759848"
[1] "Motivation (M): 110.114480452094"
[1] "Difficulty (D): 125.380512750348"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 29"
[1] "Skill (S): 129.504220703225"
[1] "Motivation (M): 105.473416230504"
[1] "Difficulty (D): 145.20375193197"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 30"
[1] "Skill (S): 129.994655587875"
[1] "Motivation (M): 118.220010572565"
[1] "Difficulty (D): 146.432735579331"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 31"
[1] "Skill (S): 129.994655587875"
[1] "Motivation (M): 115.974857017408"
[1] "Difficulty (D): 107.629576181609"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 32"
[1] "Skill (S): 130.482590708718"
[1] "Motivation (M): 120.017161286858"
[1] "Difficulty (D): 140.635737909291"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 33"
[1] "Skill (S): 130.946384435504"
[1] "Motivation (M): 118.153828768674"
[1] "Difficulty (D): 128.681395536205"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 34"
[1] "Skill (S): 131.618154959926"
[1] "Motivation (M): 124.972153083379"
[1] "Difficulty (D): 119.527547786721"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 35"
[1] "Skill (S): 132.101998345333"
[1] "Motivation (M): 125.850593500906"
[1] "Difficulty (D): 108.280221970644"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 36"
[1] "Skill (S): 132.490133295452"
[1] "Motivation (M): 144.236518603129"
[1] "Difficulty (D): 128.339029895298"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 37"
[1] "Skill (S): 133.055283600042"
[1] "Motivation (M): 149.202494104111"
[1] "Difficulty (D): 147.974229140968"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 38"
[1] "Skill (S): 133.507352863446"
[1] "Motivation (M): 156.755332351992"
[1] "Difficulty (D): 124.62841655251"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 39"
[1] "Skill (S): 134.037383858263"
[1] "Motivation (M): 160.496502830903"
[1] "Difficulty (D): 136.197042532566"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 40"
[1] "Skill (S): 134.565992543945"
[1] "Motivation (M): 153.174042201684"
[1] "Difficulty (D): 146.151470918972"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 41"
[1] "Skill (S): 135.058367822451"
[1] "Motivation (M): 146.132570209439"
[1] "Difficulty (D): 136.541180559682"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 42"
[1] "Skill (S): 135.504104849046"
[1] "Motivation (M): 148.984702342995"
[1] "Difficulty (D): 129.302440260269"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 43"
[1] "Skill (S): 135.951397053593"
[1] "Motivation (M): 140.357757338377"
[1] "Difficulty (D): 120.848460011321"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 44"
[1] "Skill (S): 136.487051471424"
[1] "Motivation (M): 146.920815399861"
[1] "Difficulty (D): 132.809853553358"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 45"
[1] "Skill (S): 136.816580108482"
[1] "Motivation (M): 153.355232497589"
[1] "Difficulty (D): 142.776249342971"
[1] "Engagement outcome: try"
[1] "Tried and got the answer incorrect"
[1] ""
[1] "Iteration: 46"
[1] "Skill (S): 136.816580108482"
[1] "Motivation (M): 153.859389036659"
[1] "Difficulty (D): 128.734220966767"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 47"
[1] "Skill (S): 137.473270590598"
[1] "Motivation (M): 162.199431100175"
[1] "Difficulty (D): 137.195864407006"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 48"
[1] "Skill (S): 137.956366611053"
[1] "Motivation (M): 167.083497338252"
[1] "Difficulty (D): 151.781448068811"
[1] "Engagement outcome: guess"
[1] "Guessed and got the answer correct"
[1] ""
[1] "Iteration: 49"
[1] "Skill (S): 137.956366611053"
[1] "Motivation (M): 164.058259029736"
[1] "Difficulty (D): 136.962439805505"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 50"
[1] "Skill (S): 138.456072582959"
[1] "Motivation (M): 161.074553237652"
[1] "Difficulty (D): 137.903424484671"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 51"
[1] "Skill (S): 138.968584073999"
[1] "Motivation (M): 154.785584420462"
[1] "Difficulty (D): 144.778910935767"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 52"
[1] "Skill (S): 139.60886674453"
[1] "Motivation (M): 149.33014027247"
[1] "Difficulty (D): 138.117433865439"
[1] "Engagement outcome: try"
[1] "Tried and got the answer correct"
[1] ""
[1] "Iteration: 53"
[1] "Skill (S): 140.119274772722"
[1] "Motivation (M): 143.754352544077"
[1] "Difficulty (D): 139.063768285554"
[1] "Engagement outcome: quit"
[1] ""
print(paste("Final skill:", S))[1] "Final skill: 140.119274772722"
print(paste("Final motivation:", M))[1] "Final motivation: 143.754352544077"
Let’s start by merging all data, so that we can filter for completenes. After filtering out incomplete data, and so on, 1.3M rows become 660K.
complete_data = train_data %>%
left_join(question_metadata, by = c("QuestionId" = "QuestionId")) %>%
left_join(answer_metadata, by = c("AnswerId" = "AnswerId")) %>%
left_join(student_metadata, by = c("UserId" = "UserId")) %>%
# drop incomplete students
filter(!is.na(DateOfBirth), !is.na(Gender), !is.na(PremiumPupil)) %>%
group_by(UserId) %>%
mutate(
average_confidence = mean(Confidence, na.rm = TRUE),
confidence_response_rate = sum(!is.na(Confidence))/n(),
average_correctness = mean(IsCorrect, na.rm = TRUE),
) %>%
ungroup() %>%
filter(!is.na(average_confidence))From this data, I will sample 50 students, with high confidence completeness (> .80, drops to 97K).
set.seed(18)
data50 = complete_data %>%
filter(confidence_response_rate > 0.8) %>%
group_by(UserId) %>%
nest() %>%
ungroup() %>%
slice_sample(n = 50) %>%
unnest(cols = c(data)) %>%
arrange(UserId, QuizId,DateAnswered) %>%
group_by(UserId,QuizId) %>%
mutate(
cum_percent_correct = cumsum(IsCorrect)/row_number(),
cum_percent_confident = cumsum(Confidence)/row_number(),
problems_so_far = row_number(),
student_age = (difftime(DateAnswered, DateOfBirth, units = "auto") %>% as.numeric())/364.25
)
set.seed(352)
data50 = data50 %>%
group_by(UserId) %>%
slice_sample(n = 1) %>%
ungroup() %>%
select(UserId, Gender, student_age, QuizId, QuestionId, SubjectId, IsCorrect, Confidence, average_confidence, average_correctness, cum_percent_correct, cum_percent_confident, problems_so_far) %>%
mutate(Gender = case_match(Gender,
1 ~ "Male",
2 ~ "Female"
)) %>%
rowwise() %>%
# lets pull random values for the psychological variables
mutate(
iq = rnorm(1, 0, 1),
extraversion = rnorm(1, 0, 1),
agreeableness = rnorm(1, 0, 1),
conscientiousness = rnorm(1, 0, 1),
neuroticism = rnorm(1, 0, 1),
openness = rnorm(1, 0, 1)
) %>%
ungroup() %>%
# and for motivatoinal states (0 - 10)
mutate(
confidence = sample(0:10, 50, replace = T),
frustration = sample(0:10, 50, replace = T),
boredom = sample(0:10, 50, replace = T),
curiosity = sample(0:10, 50, replace = T),
engagement = sample(0:10, 50, replace = T)
)
questions = data50$QuestionId
subjects_to_words = function(text){
text = gsub("\\[|\\]", "", text)
text = str_split(text, ",")[[1]]
text = as.numeric(text)
text = subject_metadata %>% filter(SubjectId %in% text) %>% pull(Name)
text = paste(text, collapse = ", ")
return(text)
}
data50 = data50 %>%
rowwise() %>%
mutate(
subject_text = map_chr(SubjectId, function(x) subjects_to_words(x))
) %>%
select(-SubjectId)Here is the resultting group of students with fictional data, and preset psychological variables.
data50 %>%
gt() %>%
fmt_auto() %>%
fmt_number(vars(iq:openness, average_correctness, cum_percent_correct), decimals = 2)Warning: Since gt v0.3.0, `columns = vars(...)` has been deprecated.
• Please use `columns = c(...)` instead.
Since gt v0.3.0, `columns = vars(...)` has been deprecated.
• Please use `columns = c(...)` instead.
| UserId | Gender | student_age | QuizId | QuestionId | IsCorrect | Confidence | average_confidence | average_correctness | cum_percent_correct | cum_percent_confident | problems_so_far | iq | extraversion | agreeableness | conscientiousness | neuroticism | openness | confidence | frustration | boredom | curiosity | engagement | subject_text |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 208 | Female | 12 | 73 | 696 | 0 | 50 | 72 | 0.65 | 0.71 | NA | 7 | 0.06 | 1.60 | −1.08 | 2.11 | −1.38 | 1.59 | 10 | 6 | 6 | 2 | 3 | Maths, Number, Geometry and Measure, Units of Measurement, Rounding and Estimating, Length Units, Estimation |
| 246 | Male | 11 | 18 | 87 | 0 | 75 | 62 | 0.45 | 0.00 | 75 | 2 | 2.16 | −1.09 | 0.89 | −0.45 | −0.78 | 0.45 | 5 | 1 | 5 | 3 | 3 | Maths, Geometry and Measure, Perimeter, Perimeter and Area |
| 319 | Male | 12 | 107 | 135 | 0 | 75 | 65 | 0.59 | 0.50 | 79 | 6 | −0.48 | −0.48 | −0.19 | −0.44 | −0.07 | 0.88 | 7 | 6 | 9 | 8 | 0 | Maths, Algebra, Algebraic Fractions, Adding and Subtracting Algebraic Fractions |
| 350 | Male | 12 | 32 | 819 | 0 | 25 | 36 | 0.59 | 0.50 | 58 | 10 | −1.00 | 0.80 | 0.36 | 2.00 | 1.65 | −0.64 | 3 | 2 | 9 | 6 | 6 | Maths, Algebra, Straight Line Graphs, Parallel Lines |
| 356 | Female | 12 | 85 | 146 | 0 | 50 | 57 | 0.43 | 0.75 | 81 | 4 | 0.63 | 0.77 | −0.51 | 0.34 | 1.05 | 0.96 | 7 | 6 | 0 | 8 | 5 | Maths, Geometry and Measure, Angles, Bearings |
| 357 | Male | 12 | 6 | 526 | 0 | 75 | 79 | 0.48 | 0.80 | 75 | 5 | 0.24 | 1.13 | 0.84 | 0.63 | 1.85 | 1.09 | 3 | 0 | 1 | 8 | 9 | Maths, Geometry and Measure, Units of Measurement, Volume and Capacity Units |
| 369 | Female | 12 | 32 | 19 | 1 | 75 | 55 | 0.34 | 0.67 | 75 | 3 | −1.27 | −0.61 | −0.50 | −0.83 | 0.73 | −2.09 | 6 | 9 | 5 | 0 | 0 | Maths, Number, Basic Arithmetic, Written Subtraction |
| 526 | Male | 12 | 25 | 611 | 0 | 75 | 92 | 0.67 | 0.88 | 98 | 16 | −0.07 | 0.22 | 0.59 | 0.91 | 1.66 | 1.74 | 4 | 4 | 6 | 4 | 0 | Maths, Algebra, Writing and Simplifying Expressions, Dividing Terms |
| 565 | Female | 12 | 72 | 756 | 0 | 50 | 73 | 0.56 | 0.00 | 50 | 6 | 0.84 | −0.35 | −1.61 | −1.88 | −0.06 | −0.89 | 2 | 3 | 4 | 6 | 3 | Maths, Data and Statistics, Data Collection, Types of Data and Questionnaires |
| 704 | Male | 11 | 62 | 833 | 0 | 50 | 72 | 0.60 | 0.43 | 82 | 7 | 0.04 | 0.85 | −0.49 | −1.79 | 0.00 | −1.40 | 5 | 8 | 5 | 9 | 10 | Maths, Data and Statistics, Data Processing, Range and Interquartile Range from a List of Data |
| 804 | Female | 12 | 55 | 877 | 1 | 100 | 87 | 0.58 | 0.57 | 89 | 7 | −0.30 | −1.65 | 0.13 | −0.29 | −1.44 | −0.11 | 6 | 2 | 6 | 5 | 9 | Maths, Geometry and Measure, Units of Measurement, Time |
| 805 | Male | 12 | 5 | 789 | 0 | 100 | 89 | 0.40 | 0.00 | 100 | 1 | 1.26 | 0.06 | 1.41 | −0.84 | −0.41 | −1.31 | 9 | 8 | 9 | 7 | 2 | Maths, Geometry and Measure, 3D Shapes, Isometric Drawings |
| 1,079 | Male | 12 | 32 | 438 | 0 | 100 | 72 | 0.45 | 0.50 | 100 | 6 | −0.88 | 1.82 | −1.23 | 0.53 | 0.85 | −0.01 | 6 | 8 | 5 | 3 | 0 | Maths, Number, Basic Arithmetic, Mental Addition and Subtraction |
| 1,199 | Male | 11 | 62 | 314 | 1 | 100 | 69 | 0.52 | 1.00 | NA | 3 | −1.24 | −0.96 | 0.81 | 0.37 | −0.31 | −0.79 | 5 | 0 | 0 | 0 | 2 | Maths, Number, Indices, Powers and Roots, Squares, Cubes, etc |
| 1,672 | Female | 12 | 59 | 501 | 1 | 100 | 72 | 0.63 | 0.83 | 67 | 6 | 0.25 | 1.37 | −1.67 | 0.42 | 1.64 | −0.30 | 5 | 1 | 6 | 2 | 10 | Maths, Data and Statistics, Probability of Single Events, Probability |
| 1,694 | Female | 12 | 36 | 95 | 0 | 100 | 86 | 0.58 | 0.40 | 100 | 5 | 0.17 | −1.69 | 0.52 | 1.92 | 1.21 | −0.17 | 9 | 0 | 10 | 3 | 8 | Maths, Number, Indices, Powers and Roots, Algebra, Writing and Simplifying Expressions, Laws of Indices, Multiplying Terms, Dividing Terms |
| 1,764 | Female | 12 | 86 | 579 | 1 | 100 | 82 | 0.60 | 1.00 | 100 | 1 | 1.72 | 0.88 | −0.45 | 0.80 | 0.96 | 1.51 | 2 | 10 | 1 | 6 | 3 | Maths, Number, Basic Arithmetic, Place Value |
| 1,785 | Female | 12 | 39 | 169 | 0 | 0 | 50 | 0.84 | 0.86 | 50 | 7 | −0.35 | 0.20 | −1.63 | 0.25 | 0.00 | 0.04 | 7 | 10 | 4 | 10 | 5 | Maths, Data and Statistics, Pictogram, Data Representation |
| 1,790 | Male | 13 | 64 | 922 | 1 | 50 | 48 | 0.75 | 0.71 | 43 | 7 | −0.14 | 0.00 | 1.34 | −1.83 | −1.77 | −0.37 | 10 | 4 | 7 | 10 | 8 | Maths, Geometry and Measure, Angles, Angles in Triangles |
| 1,796 | Male | 12 | 109 | 904 | 0 | 0 | 67 | 0.45 | 0.27 | 66 | 11 | −0.19 | 0.12 | −1.88 | −0.99 | −0.12 | 0.22 | 4 | 8 | 4 | 1 | 10 | Maths, Number, Percentages, Repeated Percentages and Compound Interest |
| 1,802 | Male | 12 | 65 | 278 | 1 | 50 | 65 | 0.53 | 1.00 | 50 | 1 | 1.56 | −0.69 | 1.27 | −0.39 | −1.96 | −1.53 | 10 | 9 | 6 | 5 | 3 | Maths, Algebra, Straight Line Graphs, Parallel Lines |
| 1,947 | Male | 12 | 63 | 69 | 0 | 100 | 46 | 0.31 | 0.14 | NA | 7 | −0.87 | 0.05 | −0.37 | −0.48 | 0.81 | 2.22 | 1 | 5 | 2 | 8 | 9 | Maths, Algebra, Quadratic Equations, Solving Equations |
| 2,277 | Female | 12 | 37 | 30 | 1 | 50 | 71 | 0.59 | 0.56 | 66 | 16 | 1.29 | −0.23 | −1.73 | −0.89 | −0.60 | −1.74 | 9 | 5 | 4 | 3 | 3 | Maths, Algebra, Straight Line Graphs, Other Graphs, Equation of a Circle, Graphical Solution of Simultaneous Equations |
| 2,303 | Female | 12 | 95 | 734 | 0 | 50 | 45 | 0.56 | 0.80 | 50 | 5 | 0.70 | 0.03 | 0.21 | −0.30 | −0.07 | 0.26 | 1 | 3 | 7 | 9 | 7 | Maths, Number, Fractions, Fractions of an Amount |
| 2,553 | Female | 12 | 75 | 289 | 1 | 50 | 28 | 0.36 | 0.56 | NA | 9 | −0.99 | −1.14 | −0.93 | −0.20 | −0.70 | −0.61 | 3 | 8 | 1 | 1 | 1 | Maths, Geometry and Measure, Volume and Surface Area, Surface Area of Prisms |
| 2,700 | Female | 12 | 68 | 803 | 1 | 75 | 94 | 0.77 | 0.92 | 69 | 13 | −0.56 | −1.33 | −0.97 | 0.82 | 0.91 | −0.92 | 5 | 4 | 5 | 9 | 1 | Maths, Algebra, Rearranging Formula and Equations, Formula |
| 2,780 | Male | 12 | 53 | 616 | 0 | NA | 73 | 0.71 | 0.75 | NA | 4 | 1.12 | −0.84 | 0.13 | 2.34 | −2.53 | −0.59 | 10 | 1 | 8 | 6 | 10 | Maths, Geometry and Measure, Angles, Angles in Polygons |
| 2,801 | Male | 11 | 18 | 491 | 1 | 100 | 91 | 0.81 | 0.67 | 86 | 9 | −0.43 | −0.36 | −1.16 | −0.03 | −0.40 | 0.22 | 3 | 1 | 8 | 8 | 8 | Maths, Number, Indices, Powers and Roots, Squares, Cubes, etc, Square Roots, Cube Roots, etc |
| 2,828 | Female | 12 | 111 | 802 | 1 | 100 | 83 | 0.69 | 0.80 | 98 | 10 | −0.39 | 1.36 | −1.98 | 0.09 | 0.50 | 0.29 | 0 | 2 | 8 | 8 | 1 | Maths, Algebra, Linear Equations, Solving Equations |
| 2,881 | Male | 12 | 86 | 187 | 0 | NA | 67 | 0.60 | 0.80 | NA | 5 | −0.83 | −0.56 | 0.16 | 0.23 | 1.22 | 0.25 | 9 | 10 | 8 | 0 | 5 | Maths, Number, Fractions, Fractions of an Amount |
| 3,187 | Female | 12 | 39 | 93 | 0 | 75 | 60 | 0.60 | 0.54 | NA | 13 | −0.06 | −1.31 | 0.42 | 0.16 | −0.44 | 0.38 | 9 | 2 | 10 | 4 | 1 | Maths, Geometry and Measure, Area of Simple Shapes, Perimeter and Area, Compound Area, Maths-Others, Perimeter and Area-Others |
| 3,371 | Female | 12 | 17 | 791 | 1 | 75 | 67 | 0.44 | 0.43 | 61 | 7 | −1.87 | −0.72 | −1.50 | 3.04 | −0.89 | −0.77 | 2 | 8 | 3 | 5 | 3 | Maths, Data and Statistics, Histogram, Data Representation |
| 3,691 | Male | 12 | 73 | 284 | 1 | 50 | 78 | 0.86 | 1.00 | 72 | 8 | 0.56 | −0.59 | −0.63 | 0.64 | 0.98 | −0.31 | 5 | 6 | 4 | 1 | 4 | Maths, Algebra, Inequalities, Graphing Linear Inequalities (Shading Regions) |
| 3,764 | Female | 12 | 5 | 423 | 1 | 100 | 85 | 0.64 | 0.71 | 88 | 17 | 0.56 | −1.95 | 0.38 | 0.92 | 0.17 | −0.58 | 3 | 1 | 4 | 9 | 9 | Maths, Geometry and Measure, 3D Shapes, Names and Properties of 3D Shapes |
| 4,281 | Male | 12 | 9 | 868 | 0 | 0 | 48 | 0.49 | 0.75 | 6 | 4 | 1.44 | 0.66 | 0.74 | 0.49 | −0.57 | 2.93 | 8 | 9 | 0 | 4 | 4 | Maths, Data and Statistics, Pie Chart, Data Representation |
| 4,497 | Female | 12 | 111 | 761 | 1 | 100 | 92 | 0.90 | 0.93 | 98 | 14 | 0.98 | 0.29 | 0.67 | −0.95 | −0.60 | 1.90 | 2 | 8 | 0 | 6 | 3 | Maths, Number, Decimals, Adding and Subtracting with Decimals |
| 4,628 | Male | 12 | 6 | 235 | 0 | 100 | 26 | 0.40 | 0.70 | 88 | 10 | 0.16 | 1.35 | −0.35 | −1.47 | 0.09 | 0.97 | 1 | 6 | 4 | 7 | 10 | Maths, Number, Basic Arithmetic, Mental Multiplication and Division |
| 4,875 | Female | 11 | 50 | 553 | 1 | 75 | 73 | 0.69 | 0.86 | 75 | 7 | 1.79 | −1.00 | −1.15 | −0.25 | 0.10 | 1.42 | 0 | 5 | 0 | 5 | 8 | Maths, Number, Factors, Multiples and Primes, Multiples and Lowest Common Multiple |
| 5,041 | Male | 12 | 65 | 80 | 1 | 100 | 82 | 0.75 | 1.00 | 88 | 2 | −1.48 | −0.13 | 0.19 | −0.16 | −0.29 | −1.62 | 1 | 1 | 6 | 5 | 8 | Maths, Number, BIDMAS, Basic Arithmetic |
| 5,164 | Male | 11 | 78 | 298 | 1 | 75 | 73 | 0.54 | 0.80 | 80 | 5 | 0.41 | −0.22 | −2.93 | −0.08 | −1.34 | 1.11 | 1 | 7 | 0 | 8 | 5 | Maths, Geometry and Measure, Length Scale Factors in Similar Shapes, Similarity and Congruency |
| 5,174 | Female | 11 | 53 | 342 | 1 | 50 | 62 | 0.55 | 1.00 | 88 | 6 | −1.64 | 0.56 | −1.19 | −0.01 | 0.25 | 1.02 | 8 | 6 | 10 | 10 | 0 | Maths, Geometry and Measure, Perimeter, Perimeter and Area |
| 5,482 | Female | 12 | 78 | 567 | 1 | 100 | 79 | 0.57 | 0.40 | 100 | 5 | 0.98 | −0.48 | 0.02 | 0.12 | 0.53 | −0.37 | 2 | 10 | 2 | 7 | 10 | Maths, Number, Fractions, Fractions of an Amount |
| 5,549 | Female | 12 | 59 | 64 | 0 | 100 | 53 | 0.65 | 0.00 | 100 | 1 | 1.69 | 2.44 | −0.51 | 1.24 | 1.39 | 1.17 | 10 | 5 | 7 | 0 | 9 | Maths, Algebra, Quadratic Graphs, Plotting Quadratics from Tables of Values |
| 5,587 | Female | 12 | 49 | 346 | 0 | 75 | 73 | 0.55 | 0.60 | 60 | 5 | 0.82 | −0.75 | 1.07 | −0.76 | −0.76 | 0.46 | 3 | 6 | 5 | 0 | 5 | Maths, Geometry and Measure, Basic Trigonometry, Non Right-angled Triangles (Sine and Cosine Rules) |
| 5,655 | Female | 12 | 5 | 852 | 0 | 100 | 99 | 0.53 | 0.33 | 100 | 9 | 0.61 | 0.69 | 2.08 | −0.53 | −1.06 | 0.20 | 7 | 8 | 3 | 0 | 2 | Maths, Number, Fractions, Multiplying Fractions |
| 5,677 | Female | 12 | 66 | 6 | 1 | 75 | 50 | 0.50 | 0.57 | 62 | 14 | 0.68 | −1.49 | −0.03 | −0.58 | 0.08 | −0.75 | 7 | 6 | 9 | 4 | 10 | Maths, Algebra, Linear Equations, Solving Equations |
| 5,717 | Male | 12 | 55 | 565 | 1 | 100 | 91 | 0.62 | 0.80 | 100 | 5 | −1.12 | 1.18 | −0.92 | 0.94 | −1.53 | 1.51 | 2 | 1 | 5 | 4 | 1 | Maths, Number, Factors, Multiples and Primes, Multiples and Lowest Common Multiple |
| 5,762 | Male | 11 | 46 | 134 | 1 | 100 | 84 | 0.67 | 0.89 | NA | 9 | 0.19 | 1.76 | 0.11 | 1.40 | 0.73 | −0.15 | 0 | 1 | 0 | 0 | 3 | Maths, Geometry and Measure, Angles, Measuring Angles |
| 5,855 | Male | 12 | 103 | 432 | 1 | 100 | 89 | 0.65 | 0.83 | NA | 6 | −0.92 | −2.71 | −0.80 | 0.33 | −0.92 | 0.63 | 0 | 8 | 0 | 6 | 10 | Maths, Geometry and Measure, Angles, Congruency in Triangles, Similarity and Congruency, Angles in Triangles |
| 5,917 | Female | 12 | 109 | 219 | 1 | 75 | 69 | 0.88 | 1.00 | NA | 8 | 0.31 | 1.94 | 1.33 | 1.60 | 0.12 | −1.20 | 6 | 9 | 2 | 6 | 0 | Maths, Number, Upper and Lower Bounds, Rounding and Estimating |
And here are the sampled problems
We can use the following code to generate prompts for the students.
library(glue)
generate_prompt <- function(question_text, subject_text, IsCorrect, Confidence, problems_so_far,
cum_percent_confident, cum_percent_correct, student_age, Gender, iq, extraversion,
agreeableness, conscientiousness, neuroticism, openness, confidence,
frustration, boredom, curiosity, engagement) {
prompt <- glue("
You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help.
This question:
Question text: {question_text}
The question is about: {subject_text}
Did the student get the question right? {ifelse(IsCorrect == 0, 'No', 'Yes')}
What was the student's confidence for this question? {Confidence}
About the student's learning session so far:
This student has worked on {problems_so_far} problems so far.
The student's average confidence so far is {cum_percent_confident %>% round(0)}.
The student has answered {(cum_percent_correct*100) %>% round(0)}% questions correctly so far.
About the student:
Student age: {student_age %>% round(0)}
Student gender: {Gender}
Student IQ: {iq %>% round(3)}
Student Extraversion: {extraversion%>% round(3)}
Student Agreeableness: {agreeableness%>% round(3)}
Student Conscientiousness: {conscientiousness%>% round(3)}
Student Neuroticism: {neuroticism%>% round(3)}
Student Openness: {openness%>% round(3)}
And this is the student's current motivational state. These are in a scale of 0 - 10
Confidence: {confidence}
Frustration: {frustration}
Boredom: {boredom}
Curiosity/Interest: {curiosity}\n
Engagement: {engagement}
")
return(prompt)
}
library(purrr)
data50 = data50 %>%
mutate(question_text = "to be transcribed") %>%
mutate(prompt = pmap(list(question_text, subject_text, IsCorrect, Confidence, problems_so_far,
cum_percent_confident, cum_percent_correct, student_age, Gender, iq, extraversion,
agreeableness, conscientiousness, neuroticism, openness, confidence,
frustration, boredom, curiosity, engagement), generate_prompt))We now have the 50 prompts generated!
data50 %>%
select(prompt) %>%
head() %>%
gt()| prompt |
|---|
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Number, Geometry and Measure, Units of Measurement, Rounding and Estimating, Length Units, Estimation Did the student get the question right? No What was the student's confidence for this question? 50 About the student's learning session so far: This student has worked on 7 problems so far. The student's average confidence so far is NA. The student has answered 71% questions correctly so far. About the student: Student age: 12 Student gender: Female Student IQ: 0.057 Student Extraversion: 1.596 Student Agreeableness: -1.078 Student Conscientiousness: 2.106 Student Neuroticism: -1.378 Student Openness: 1.589 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 10 Frustration: 6 Boredom: 6 Curiosity/Interest: 2 Engagement: 3 |
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Geometry and Measure, Perimeter, Perimeter and Area Did the student get the question right? No What was the student's confidence for this question? 75 About the student's learning session so far: This student has worked on 2 problems so far. The student's average confidence so far is 75. The student has answered 0% questions correctly so far. About the student: Student age: 11 Student gender: Male Student IQ: 2.159 Student Extraversion: -1.089 Student Agreeableness: 0.894 Student Conscientiousness: -0.446 Student Neuroticism: -0.782 Student Openness: 0.452 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 5 Frustration: 1 Boredom: 5 Curiosity/Interest: 3 Engagement: 3 |
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Algebra, Algebraic Fractions, Adding and Subtracting Algebraic Fractions Did the student get the question right? No What was the student's confidence for this question? 75 About the student's learning session so far: This student has worked on 6 problems so far. The student's average confidence so far is 79. The student has answered 50% questions correctly so far. About the student: Student age: 12 Student gender: Male Student IQ: -0.477 Student Extraversion: -0.485 Student Agreeableness: -0.191 Student Conscientiousness: -0.439 Student Neuroticism: -0.072 Student Openness: 0.878 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 7 Frustration: 6 Boredom: 9 Curiosity/Interest: 8 Engagement: 0 |
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Algebra, Straight Line Graphs, Parallel Lines Did the student get the question right? No What was the student's confidence for this question? 25 About the student's learning session so far: This student has worked on 10 problems so far. The student's average confidence so far is 58. The student has answered 50% questions correctly so far. About the student: Student age: 12 Student gender: Male Student IQ: -0.996 Student Extraversion: 0.803 Student Agreeableness: 0.358 Student Conscientiousness: 2.004 Student Neuroticism: 1.651 Student Openness: -0.643 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 3 Frustration: 2 Boredom: 9 Curiosity/Interest: 6 Engagement: 6 |
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Geometry and Measure, Angles, Bearings Did the student get the question right? No What was the student's confidence for this question? 50 About the student's learning session so far: This student has worked on 4 problems so far. The student's average confidence so far is 81. The student has answered 75% questions correctly so far. About the student: Student age: 12 Student gender: Female Student IQ: 0.625 Student Extraversion: 0.771 Student Agreeableness: -0.514 Student Conscientiousness: 0.339 Student Neuroticism: 1.052 Student Openness: 0.958 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 7 Frustration: 6 Boredom: 0 Curiosity/Interest: 8 Engagement: 5 |
| You are a simulator for an intelligent tutoring system. I will tell you a math question a student is working on, some information about the student and their interaction with the question, and you will generate a conversation between the student and the intelligent tutor. I will tell you some psychological characteristics of the student, but the tutor does not know them explicitly, they are there just so that you can more accurately model the student. (All psychological variables are specified as z-scores, such that 0 is the mean and 1 is one standard deviation above the mean). Make sure that the student is asking for some help. This question: Question text: to be transcribed The question is about: Maths, Geometry and Measure, Units of Measurement, Volume and Capacity Units Did the student get the question right? No What was the student's confidence for this question? 75 About the student's learning session so far: This student has worked on 5 problems so far. The student's average confidence so far is 75. The student has answered 80% questions correctly so far. About the student: Student age: 12 Student gender: Male Student IQ: 0.24 Student Extraversion: 1.13 Student Agreeableness: 0.839 Student Conscientiousness: 0.627 Student Neuroticism: 1.855 Student Openness: 1.09 And this is the student's current motivational state. These are in a scale of 0 - 10 Confidence: 3 Frustration: 0 Boredom: 1 Curiosity/Interest: 8 Engagement: 9 |